Simultaneous Reliability Evaluation of Generality and Accuracy for Rule Discovery in Databases
نویسنده
چکیده
This paper presents an algorithm for discovering conjunction rules with high reliability from data sets. The discovery of conjunction rules, each of which is a restricted form of a production rule, is well motivated by various useflll applications such as semantic query optimization and automatic development of a knowledge base. In a discovery algorithm, a production rule is evaluated according to its generality and accuracy since these are widely accepted as criteria in learning from examples. Here, reliability evaluation for these criteria is mandatory in distinguishing reliable rules from unreliable patterns without annoying the users. However, previous discovery approaches have either ignored reliability evaluation or have only evaluated the reliability of generality, and consequently, tend to discover a huge number of rules. In order to circumvent these difficulties we propose an approach based on a simultaneous estimation. Our approach discovers the rules that exceed pre-specified thresholds for generality and accuracy with high reliability. A novel pruning method is employed for improving time efficiency without changing the discovery outcome. The proposed approach has been validated experimentally using 21 benchmark data sets from the UCI repository.
منابع مشابه
The Relative generality and precision of Evidence Based Medical Infor-mation Resources in the Recovery of Diabetes Information
Background and Aim: Relative generality and precision are two important criteria for measuring the efficiency and performance of information retrieval systems. The aim of this study was to compare the integrity and location of evidence-based bases in the digital library of Hamedan University of Medical Sciences in data retrieval of diabetes. Methods: The design of this research is cross-sect...
متن کاملInterestingness Measure for Mining Spatial Gene Expression Data using Association Rule
The search for interesting association rules is an important topic in knowledge discovery in spatial gene expression databases. The set of admissible rules for the selected support and confidence thresholds can easily be extracted by algorithms based on support and confidence, such as Apriori. However, they may produce a large number of rules, many of them are uninteresting. The challenge in as...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملNumeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کامل